Spatial Transcriptomics Learned Non-negative Matrix Factorization (STL NMF)

STL NMF is the culmination of my research in Non-negative Matrix Factorization (NMF) and its application to the analysis of Spatial Transriptomics data. This research took place at Case Western Reserve University under the guidance of Dr. Weihong Guo and the endless support by fellow researcher Jiasen Zhang. Read more about the specifics of the project and resulting paper below.

Background:

Spatial transcriptomics is the study of spatial transcriptomics data (X) -- a count matrix of m genes in n spatial locations. This ST data is often aided by an RNA-seq reference dataset (B) -- an expression matrix of the same m genes in p single cells.

X = \left[ \begin{matrix} x_{1,1} & x_{1,2} & \ldots & x_{1,m} \\ x_{2,1} & x_{2,2} & \ldots & x_{2,m} \\ \vdots & \vdots & \ddots & \vdots \\ x_{n,1} & x_{n,2} & \ldots & x_{n,m} \\ \end{matrix}\right]

B = \left[ \begin{matrix} b_{1,1} & b_{1,2} & \ldots & b_{1,p} \\ b_{2,1} & b_{2,2} & \ldots & b_{2,p} \\ \vdots & \vdots & \ddots & \vdots \\ b_{n,1} & b_{n,2} & \ldots & b_{n,p} \\ \end{matrix}\right]

Biologically, we know that genes and cells are linked through both position and function. Using this understanding, Patrik Ståhl et al. first proposed the use of Spatial Transcriptomics data to deconvolve a spatial mapping of genes into a spatial mapping of cell types (V).

There are a number of strategies to do this, each with its own strengths and weaknesses, but the most popular strategies include deep learning, probabilistic, optimal transport, and NMF based methods. My research focuses on NMF.

V = \left[ \begin{matrix} v_{1,1} & v_{1,2} & \ldots & v_{1,m} \\ v_{2,1} & v_{2,2} & \ldots & v_{2,m} \\ \vdots & \vdots & \ddots & \vdots \\ v_{p,1} & v_{p,2} & \ldots & v_{p,m} \\ \end{matrix}\right]

Deconvolution:

To set up a matrix factorization solution to the problem, we start by setting up the equlation X = BV + ε. From here, we aim to find the "true" V, given both X and B. Next, we utilize the fact that all three matrices are non-negative (each entry can be either zero or a positive real number). This provides us with a method to solve for V (eq.2).

X = BV + \varepsilon\quad(eq. 1)

\min_{V^T, B \geq 0} ||X - BV||_F^2\quad(eq. 2)

However, this still leaves us with an ill-posed optimization problem that struggles to find unique solutions. Luckily, we can apply biological priors to add regularization terms to the problem and assist in finding an optimal solution.

Biologically, locations that are close together and are within the same biological region should have genetic makeups much more similar than spots far away. Utilizing this, we can introduce an adjacency term within our objective function that limits the freedom of NMF and ensures similarities across similar locations and disparities across disparate locations. Further, we know that the the total proportion of all cell types within one location should sum to one. This means we can introduce a sum-to-one constraint on the columns of V.

After we add these regularization terms (and one more that is mostly inconsequetial), we end up with the following objective function (eq.3) and its resulting multiplicative update rule (eq.4).

\min_{V^T \geq 0} ||X - BV||_F^2 + \lambda_1 \textrm{Tr} \left(VLV^T\right) + \lambda_2 ||V^TJ-J_m||^2_F +\lambda_3 ||WV||^2_F\quad (eq.3)

V^T = \frac{V^T_{jk}\left(X^TB+\lambda_1AV^T+\lambda_2J_mJ_m^T\right)_{jk}}{\left(V^TB^TB+\lambda_1DV^T+\lambda_2V^TJJ^T+\lambda_3V^TW^TW\right)_{jk}} \quad(eq.4)

From here, we have our problem defined, set up, and ready to be solved. Start with an initialized (random) matrix V and iteratively update it using the multiplicative update rule (eq.4). Over time, this process causes V to converge to a stable solution that best approximates the "true" V. However, even this solution is imperfect and can be improved further. To do this, we can recognize that our regularization terms and update rule are highly reliant on good choices for our hyperparameters (λ1, λ2, λ3,W).

Algorithm Unrolling:

To solve this problem of reliance, we can utilize a strategy called algorithm unrolling, first proposed by Gregor and LeCun in 2010. This strategy consists of transforming our iterative optimization process into a (repetitive) single layered neural network that trains on the hyperparameters of our problem. Doing so, we can learn the optimal hyperparameters (λ1, λ2, λ3,W) and improve our solution past many SOTA methods.

A more complete technical description of our process, method, and results can be found in our paper Unrolling Regularized Non-negative Matrix Factorization for Spatial Transcriptomics Deconvolution.